clear all
set more off

*Set working directory to the Project/ folder

import excel "Data/InputData/ThesisMW.xlsx", firstrow clear


* Ensure observation_date is recognized as a Stata date
rename observation_date year_string  

* Convert year from date format to numeric year
gen year = year(year_string)  

* Verify conversion
list year_string year in 1/5

* Drop original year_string column after conversion
drop year_string

* Reshape from wide to long format
reshape long STTMINWG, i(year) j(state_code) string

* Rename state_code to state (it already holds correct values)
rename state_code state  

* Drop non-state territories
drop if inlist(state, "PR", "GU", "VI", "DC")

* Drop incorrectly reshaped state codes
drop if state == "OK_20230101" | state == "FG"

* Verify all unique state codes
tab state

* Rename the minimum wage column
rename STTMINWG state_min_wage

* Ensure state_min_wage is numeric
destring state_min_wage, replace force

* Save cleaned dataset
save "Data/IntermediateData/cleaned_min_wage.dta", replace
* Verify the cleaned data
list year state state_min_wage if year == 2018, clean
summarize state_min_wage

